class: center, middle, inverse, title-slide .title[ # 2023 Women in Statistics and Data Science Conference ] .subtitle[ ## Mortality Rates from Violent Deaths by Racial and Ethnic Groups in the United States, 2016-2020 ] .author[ ###
Ying-Ju Tessa Chen, PhD
(This is joint work with Dr. Tatjana Miljkovic.)
Associate Professor
Department of Mathematics
University of Dayton
@ying-ju
ying-ju
ychen4@udayton.edu
] .date[ ### October 27, 2023 ] --- ## Background --- ## Objective We investigate the mortality rates by identifying violent manners of death in underrepresented communities relative to the white communities in the U.S. by studying the following: - Whether mortality rates for the identified violent manner of death differ between underrepresented communities and white communities in the U.S. - Whether sex, race, age and the manner of death are significant factors in determining violent mortality rates in the U.S. - The relative risk of specific manners of death, including suicide, homicide, legal intervention by police and other authorities, unintentional firearm–self-inflicted, and unintentional firearm–unknown-inflicted. --- ## Datasets Used .pull-left[ - NVDRS - Manners of Deaths - Year - State - Sex - Age - Race - Ethnicity .left[.footnote[.blue[National Violent Death Reporting System]] ]] .pull-right[ - CDC WONDER - Year - State - Gender - Age Group - Race - Population .left[.footnote[.blue[Wide-ranging ONline Data for Epidemiologic Research]]] ] --- ## Terminology for Race and Ethnicity used Across Sources <table> <thead> <tr> <th style="text-align:left;"> Race/Ethnicity </th> <th style="text-align:left;"> This Report </th> <th style="text-align:left;"> NVDRS </th> <th style="text-align:left;"> WONDER </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> Asian/Asian American </td> <td style="text-align:left;"> Asian/Pacific Islander </td> <td style="text-align:left;"> Asian and Pacific Islander </td> </tr> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> Black/African American </td> <td style="text-align:left;"> Black or African American </td> <td style="text-align:left;"> Black or African American </td> </tr> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> Native American/Alaska Indigenous Resident </td> <td style="text-align:left;"> American Indian/Alaska Native </td> <td style="text-align:left;"> American Indian and Alaska Native </td> </tr> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> White </td> <td style="text-align:left;"> White </td> <td style="text-align:left;"> White </td> </tr> <tr> <td style="text-align:left;"> Ethnicity </td> <td style="text-align:left;"> Hispanic/Latino </td> <td style="text-align:left;"> Hispanic </td> <td style="text-align:left;"> Hispanic or Latino </td> </tr> </tbody> </table> --- ## NVDRS State Data - 27 States (43%) <style> .custom h2, .custom p { margin-bottom: 0; } </style>
--- ## EDA <img src="data:image/png;base64,#./figures/Homicide_Race_Age.png" width="100%" style="display: block; margin: auto;" /> .caption[Homicide Mortality Rates Per 100,000 From 2016 To 2020 By Age Group And Race ] --- ## Motivation to Go Beyond Exploratory Data Analysis - Limitations of EDA - Importance of Statistical Modeling - Applications of Statistical Modeling - Conclusion --- ## Methodology - Negative Binomial GLM - `\(Y_i, i=1, 2, \ldots, N\)`: random variables for the number of deaths due to a violent event, and its realizations are denoted as `\(y_i, i=1, 2, \ldots, N\)`. - Assume `\(Y_i|\mu_i, r_i \sim NegBin(\mu_i, r_i), i=1, 2, \ldots, N\)` - `\(X_i'=(X_{1i}, X_{2i}, \ldots X_{pi})\)`: a `\(p\)`-dimensional vector of categorical factors with its realization `\(x_i'=(X_{1i}, x_{2i}, \ldots x_{pi})\)`, `\(i=1, 2, \ldots, N\)`. When modeling mortality rates, the Negative-Binomial GLM can be related to the linear model for the ratio response as follows: .center[ `\(\log \left(\frac{Y_i}{p_i}\right) = \beta_1 + \beta_2 x_{1i} + \beta_3 x_{2i} + \cdots + \beta_p x_{i, p-1},\)` ] where `\(p_i\)` represents the population count associated with the number of deaths `\(Y_i\)` for the `\(i\)`th level of aggregation. --- ## Statistical Analysis - GLM Modeling .pull-left-2[ .small[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> MODEL NAME (First 2 Characters) </th> <th style="text-align:center;"> FACTORS </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;width: 4cm; "> M1 </td> <td style="text-align:center;"> Year + State + Sex + Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M2 </td> <td style="text-align:center;"> State + Sex + Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M3 </td> <td style="text-align:center;"> Sex + Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M4 </td> <td style="text-align:center;"> Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M5 </td> <td style="text-align:center;"> Sex + Race (or Ethnicity) </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M6 </td> <td style="text-align:center;"> Race (or Ethnicity) </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M7 </td> <td style="text-align:center;"> Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M8 </td> <td style="text-align:center;"> Race (or Ethnicity) + Age Group + Age Group:Race (or Ethnicity) </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M9 </td> <td style="text-align:center;"> Age Group + Sex </td> </tr> </tbody> </table> ] ] .pull-right-2[ <img src="data:image/png;base64,#./figures/models.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Analysis Of The Glm Models - `Step A:` The chi-square goodness-of-fit test (GOF) is applied to all GLM models and those that passed these test are further considered. - `Step B:` Model diagnostics are examined for all models. The diagnostic tools include the half-normal plot of residuals and the density plot. Diagnostic plots are analyzed along with the findings in step A. - `Step C:` All models are additionally tested for overdispersion using the R function check_overdispersion() from the R package performance (0.10.3) (Lüdecke et al. 2021), and those models that did not pass the test are disregarded if possible. - `Step D:` The likelihood-ratio test and Akaike information criterion (see Appendix C) are used to compare the pairs of models in the selected subsets of model space. - `Step E:` The investigation and selection of the most suitable model are based on balancing consistency and the results across all the above steps. - `Step F:` The most suitable selected GLM model is used to explain the results and findings of this study. --- ## Models Chosen for Suicide, Homicide, and Other <table> <thead> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:center;"> Factors </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> M6-S-R </td> <td style="text-align:center;"> Race </td> </tr> <tr> <td style="text-align:left;"> M8-S-R </td> <td style="text-align:center;"> Age Group + Race + Age Group:Race </td> </tr> <tr> <td style="text-align:left;"> M5-S-E </td> <td style="text-align:center;"> Sex + Ethnicity </td> </tr> <tr> <td style="text-align:left;"> M2-H-R </td> <td style="text-align:center;"> State + Age Group + Sex + Race </td> </tr> <tr> <td style="text-align:left;"> M2-H-E </td> <td style="text-align:center;"> State + Age Group + Sex + Ethnicity </td> </tr> <tr> <td style="text-align:left;"> M2-O-R </td> <td style="text-align:center;"> State + Age Group + Sex + Race + Manner </td> </tr> <tr> <td style="text-align:left;"> M2-O-E </td> <td style="text-align:center;"> State + Age Group + Sex + Ethnicity + Manner </td> </tr> </tbody> </table> --- ## Relative Risk By Race Or Ethnicity .pull-left[ <img src="data:image/png;base64,#./figures/Risk_Race.png" width="100%" style="display: block; margin: auto;" /> .caption[No other fixed factors are used for suicide; fixed factors for homicide are state, age, and sex; fixed factors for “other" are state, age, sex, and manner.] ] .pull-right[ <img src="data:image/png;base64,#./figures/Risk_Ethnicity.png" width="100%" style="display: block; margin: auto;" /> .caption[Fixed factor for suicide is sex; fixed factors for homicide are state, age, and sex; fixed factors for “other” are state, age, sex, and manner.] ] --- ## Relative Risks By Ethnicity and Sex in Model M5-S-E <img src="data:image/png;base64,#./figures/Risk_M5SE.png" width="90%" style="display: block; margin: auto;" /> --- ## Wrap-Up ✅ Read text-files, binary files (e.g., Excel, SAS, SPSS, Stata, etc), json files, etc. online using
✅ Scrape a webpage using
✅ Understand when can we scrape data (i.e., `robots.txt`) --- ## Thanks .pull-left[ - Please do not hesitate to contact me (Tessa Chen) if you have questions pertaining to learning R or other languages. Please email me at <a href="mailto:ychen@udayton.edu"><i class="fa fa-paper-plane fa-fw"></i> ychen4@udayton.edu</a>. - Slides were created via the R package **xaringan**, with styling based on: * [xariganthemer](https://cran.r-project.org/web/packages/xaringanthemer/vignettes/xaringanthemer.html) package, and * Alison Hill's [@apreshill](https://github.com/apreshill/) CSS resources for customizing themes and fonts - The formatting of slides is provided by Dr. Fadel M. Megahed [@fmegahed](https://github.com/fmegahed). ] .pull-right[ <img src="data:image/png;base64,#./figures/Tessa_grey_G.gif" width="60%" style="display: block; margin: auto;" /> ] --- # Appendix - Suicide Models ## Coefficients: Suicide-Race, Model 6 (M6-S-R) <table> <thead> <tr> <th style="text-align:left;"> Variable </th> <th style="text-align:center;"> Relative Risk </th> <th style="text-align:center;"> Coefficients Esitmate </th> <th style="text-align:center;"> Coefficients Standard Error </th> <th style="text-align:center;"> Z Value </th> <th style="text-align:center;"> P Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Intercept </td> <td style="text-align:center;"> - </td> <td style="text-align:center;"> 2.95 </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 197.77 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Race R2 </td> <td style="text-align:center;"> 63% </td> <td style="text-align:center;"> -0.46 </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> -16.55 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Race R3 </td> <td style="text-align:center;"> 76% </td> <td style="text-align:center;"> -0.27 </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> -7.77 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Race R4 </td> <td style="text-align:center;"> 212% </td> <td style="text-align:center;"> 0.75 </td> <td style="text-align:center;"> 0.04 </td> <td style="text-align:center;"> 18.24 </td> <td style="text-align:center;"> <0.001 </td> </tr> </tbody> </table> .right[.caption[Baseline for race is white (R1).]] <table> <thead> <tr> <th style="text-align:center;"> DF </th> <th style="text-align:center;"> DEVIATION </th> <th style="text-align:center;"> DISPERSION </th> <th style="text-align:center;"> AIC </th> <th style="text-align:center;"> P VALUE </th> <th style="text-align:center;"> 2-LIKELIHOOD </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5976 </td> <td style="text-align:center;"> 5931.16 </td> <td style="text-align:center;"> 1.87 </td> <td style="text-align:center;"> 37091.11 </td> <td style="text-align:center;"> 0.66 </td> <td style="text-align:center;"> -37081.1 </td> </tr> </tbody> </table> --- ## Diagnostic Plots for Suicide-Race Model 6 (M6-S-R) <br></br> <img src="data:image/png;base64,#./figures/suicide_race_M6.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Suicide-Ethnicity, Model 5 (M5-S-E) <table> <thead> <tr> <th style="text-align:left;"> Variable </th> <th style="text-align:center;"> Relative Risk </th> <th style="text-align:center;"> Coefficients Esitmate </th> <th style="text-align:center;"> Coefficients Standard Error </th> <th style="text-align:center;"> Z Value </th> <th style="text-align:center;"> P Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Intercept </td> <td style="text-align:center;"> - </td> <td style="text-align:center;"> 2.13 </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> 139.85 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Ethnicity E2 </td> <td style="text-align:center;"> 72% </td> <td style="text-align:center;"> -0.33 </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> -14.00 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Sex Male </td> <td style="text-align:center;"> 353% </td> <td style="text-align:center;"> 1.26 </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> 63.94 </td> <td style="text-align:center;"> <0.001 </td> </tr> </tbody> </table> .right[.caption[Baseline for ethnicity is non-Hispanic/Latino (E1), for sex is female (F).]] <table> <thead> <tr> <th style="text-align:center;"> DF </th> <th style="text-align:center;"> DEVIATION </th> <th style="text-align:center;"> DISPERSION </th> <th style="text-align:center;"> AIC </th> <th style="text-align:center;"> P VALUE </th> <th style="text-align:center;"> 2-LIKELIHOOD </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 3607 </td> <td style="text-align:center;"> 3690.25 </td> <td style="text-align:center;"> 4.31 </td> <td style="text-align:center;"> 25523.71 </td> <td style="text-align:center;"> 0.16 </td> <td style="text-align:center;"> -25515.71 </td> </tr> </tbody> </table> --- ## Diagnostic Plots for Suicide-Ethnicity Model 5 (M5-S-E) <br></br> <img src="data:image/png;base64,#./figures/suicide_ethnicity_M5.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Suicide-Race, Model 8 (M8-S-R) <img src="data:image/png;base64,#./figures/suicide_race_M8_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Suicide-Race Model 8 (M8-S-R) <br></br> <img src="data:image/png;base64,#./figures/suicide_race_M8.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Homicide-Race, Model 2 (M2-H-R) <img src="data:image/png;base64,#./figures/homicide_race_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Homicide-Race Model 2 (M2-H-R) <br></br> <img src="data:image/png;base64,#./figures/homicide_race_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Homicide-Ethnicity, Model 2 (M2-H-E) <img src="data:image/png;base64,#./figures/homicide_ethnicity_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Homicide-Ethnicity Model 2 (M2-H-E) <br></br> <img src="data:image/png;base64,#./figures/homicide_ethnicity_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Other-Race, Model 2 (M2-O-R) <img src="data:image/png;base64,#./figures/other_race_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Other-Race Model 2 (M2-O-R) <br></br> <img src="data:image/png;base64,#./figures/other_race_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Other-Ethnicity, Model 2 (M2-O-E) <img src="data:image/png;base64,#./figures/other_ethnicity_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Other-Ethnicity Model 2 (M2-O-E) <br></br> <img src="data:image/png;base64,#./figures/other_ethnicity_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]]